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Abstract 



This note gives a simple analysis of a randomized approximation scheme for matrix multipli- 
cation proposed by [Sar06] based on a random rotation followed by uniform column sampling. 
The result follows from a matrix version of Bernstein's inequality and a tail inequality for 



quadratic forms in subgaussian random vectors. 

1 Introduction 

> 

2j ; Let A := [a x \a 2 \ ■ ■ ■ \a m ] G R dAXm and B := [bi\b 2 \ ■ ■ ■ \b m ] G R^xm be fixed matrices, each with 

Tj- ■ m columns. If m is very large, then the straightforward computation of the matrix product AB T 

. (with VL{dj^dBfn) operations) can be prohibitive. 

We can instead approximate the product using the following randomized scheme. Let O G R mxm 
be a random orthogonal matrix; the distribution of O will be specified later in Theorem 1, but a 
key property of O will be that the matrix products 



A := A® and B := -BO 



can be computed with 0((cIa + ds)mlogm) operations. Given the products A = [S1I&2I ■ ■ ■ |S TO ] 
and B = [61 1 62 1 • • • \b m ], we take a small uniform random sample of pairs of their columns (drawn 
with replacement) 

(flil 1 ) ' (^*2 ) ))•••) (^!n ! kin ) ) 

and then compute the sum of outer products 

n 

It is easy to check that AB T is an unbiased estimator of AB T . The sum can be computed from A 
and B with 0{dAdB n ) operations, so overall, the matrix AB T can be computed with 0{dAdBn + 
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(d,A + dB)m\ogm) operations. (In fact, the log to can be replaced by logn [AL08].) Therefore, we 
would like n to be as small as possible so that, with high probability, ||^4-B T — ^4-B T || < e||A||||i?|| 
for some error e > 0, where || • || denotes the spectral norm. As shown in Theorem 1, it suffices to 
have 

n = n^ (fc + logM)log(fc) ^ 

where k := max{tr(^ T A)/||A|| 2 , tr(B T B)/\\B\\ 2 } < max{rank(A), rank(S)}. 

A flawed analysis of a different scheme based on non-uniform column sampling (without a 
random rotation 6) was given in [HKZ12a]; that analysis gave an incorrect bound on ||E[A 2 ]|| for a 
certain random symmetric matrix X. A different analysis of this non-uniform sampling scheme can 
be found in [MZ11], but that analysis has some deficiencies as pointed out in [HKZ12a]. The scheme 
studied in the present work, which employs a certain random rotation followed by uniform column 
sampling, was proposed by [Sar06], and is based on the Fast Johnson-Lindenstrauss Transform 
of [AC09]. The analysis given in [Sar06] bounds the Frobenius norm error; in this work, we bound 
the spectral norm error. A similar but slightly looser analysis of spectral norm error was very 
recently provided in [ABTZ12]. 



2 Analysis 

Let [to] := {1, 2, ... , to}. 

Theorem 1. Pick any 5 £ (0,1/3), and let k := max{tr(yL4 T )/||yl|| 2 , tr(BB T )/\\B\\ 2 } (note that 
k < max{rank(^4), rank(.B)}. Assume = -^DH, where D = diag(e), e £ {±l} m is a vector 
of independent Rademacher random variables, and H G |_|-j|mxm ^ s a Hadamard matrix. With 
probability at least 1 — 5, 



\AB T - AB T \\ < \\A\\\\B\ 



1 4(k + 2y/k]n(3m/6) + 2 ln(3m/<5) + 1) ln(6fe/<5) 

n 



2(k + 2^/Hn(3m/<5) + 2 ln(3m/<5) + 1) ln(6A;/<5) \ 
+ 3^ J' 

The proof of Theorem 1 is a consequence of the following lemmas, combined with a union 
bound. Lemma 1 bounds the error in terms of a certain quantity [i which depends on the random 
orthogonal matrix 9 (and A and B). Lemma 2 gives a bound on \i that holds with high probability 
over the random choice of G. 

Lemma 1. Define Q = [qi\q2\ • • • \Qm] '■= H^ll" 1 ^©? R = [nl^l • • • \ r m] '■= := 
tv(QQ T ) = ti(AA T )/\\A\\ 2 , k B := tr(iiR T ) = tr(55 T )/||B|| 2 , and 

H := mmax^{||gj|| 2 : i £ [m]} U {||rj|| 2 : i £ [m]}^J. 

Then 

' " t 



Pr 



e* - t - 1 
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Proof. Observe that because 9 is orthogonal, 

WAR? - AB T \\ = \\A\\\\B\ 



3=1 



We now derive a high probability bound for the last term on the right-hand side. Define a random 
symmetric matrix X with 



Pr 



X = m 



' q t rj 
r t qj 



i G to 



to 



and let Xi, X2, ■ ■ ■ , X n be independent copies of X. Define 



1 n 

I:=Tlj and M :- 



j'=i 



QR T 
RQ T 



Then 



\\M-M\\= i^-M) distri ^ tion -E%^-^ T 
Observe that E[X - M] = and ||X - M|| < ||X|| + ||M|| < fi + 1. Moreover, 
E[X] 2 = M 2 = 



QR T RQ T 
RQ T QR T 



\n\\ 2 QiqJ 



hifnrj 



E^illftllV^J' 



E[X 2 } = Yj m 
i=i 

m 

tr(E[X 2 ]) = 2m ^ ||%|| 2 |N| 2 < 2/x J] ||%|| |N| < 2^k A k B , 
i=i i=i 

rrt m 

||E[X 2 ]|| <mmax{|^||ri||W||, V^f} < Mmax{ ||QQ T ||, \\RR T \\] = fi, 



i=i 



||E[(X - M) 2 ]|| = ||EpT 2 ] - M 2 || < /x + 1. 
Therefore, by the matrix Bernstein inequality from [HKZ12a], 



Pr 



\M-M\\ > 



2(/i + l)t + 



< 2^k A k B 



e* - t - 1 



□ 



n 3n 

The lemma follows. 

The following lemma is a special case of a result found in [HKZ11]. 

Lemma 2. Assume O = -^DH, where D = diag(e) ; e € {±l} m is a vector of independent 
Rademacher random variables, and H £ j-|-i} mxm ^ s a Hadamard matrix. Let Z 6 M mxd fre a 
matrix with \\Z\\ < \, and set kz '■= tv(ZZ T ). Then 



Pr 



max{||Z T Oej|| 2 : i £ [to]} > 



k z + 2 v 'fc z (ln(m) + t) + 2(ln(m) + t) 



TO 



< e~ 



where ei £ {0, l} m is £/ie i-i/i coordinate axis vector in 
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Proof. Observe that for each i £ [m], the random vector y/mQei has the same distribution as e. 
Moreover, e is a subgaussian random vector in the sense that E[exp(a T e)] < exp(||a|| 2 /2) for any 
vector a 6 M m . Therefore, we may apply a tail bound for quadratic forms in subgaussian random 
vectors (e.g., [HKZ12b]) to obtain 



Pr 



mZ T Oei\\ 2 > tr(ZZ T ) + 2 Vtr((ZZ T ) 2 )r + 2\\ZZ t \\t 



< e 



for each i 6 [m] and any r > 0. The lemma follows by observing that \\ZZ T \\ < 1 and tr((ZZ T ) 2 ) < 
tv(ZZ T )\\ZZ T \\ < kz, and applying a union bound over all i G [m\. □ 

We note that Lemma 2 holds for many other distributions of orthogonal matrices (with possibly 
worse constants). All that is required is that ^/m9e 4 be a subgaussian random vector for each 
i £ [m]. See [HKZ11] for more discussion. 

Proof of Theorem 1. We apply Lemma 2 with both Z = A/\\A\\ and Z = and combine the 

implied probability bounds with a union bound to obtain 

Pr [fi > k + 2 v / Hog(3m/5) + 2 ln(3m/<J)] < 2(5/3, 

where is defined in the statement of Lemma 1, and the probabiltiy is taken with respect to the 
random choice of 0. Now we apply Lemma 1, together with the bound t/(e t — t — 1) < e~'/ 2 for 
t > 2.6, and substitute t := 21n(6/c/<5) to obtain 



Pr 



\AB- - AB-\\ > IMIIIIBIK ,/ 4 (^ + 1 )M6^) + 2(/i + l)ln(6fe/^ 



< <y/3- 



n 3n J 

Combining the two probability bounds with a union bound implies the claim. □ 
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